On The Representation Of Query Term Relations By Soft Boolean Operators

نویسنده

  • Gerard Salton
چکیده

The l a n g u a g e a n a l y s i s component i n mos t t e x t r e t r i e v a l s y s t e m s i s c o n f i n e d to a r e c o g n i t i o n of noun p h r a s e s of t h e t y p e n o r m a l l y i n c l u d e d i n b a c k o f t h e b o o k i n d e x e s , and an i d e n t i f i c a t i o n of r e l a t e d t e r m s i n c l u d e d i n a p r e c o n s t r u c t e d t h e s a u r u s of q u a s i s y n o n y m s . Even such a r e s t r i c t e d l a n g u a g e a n a l y s i s i s f r a u g h t w i t h d i f f i c u l t i e s b e c a u s e of t h e w e l l k n o w n p r o b l e m s i n t h e a n a l y s i s of compound n o m i n a l s , and t h e h a z a r d s and c o s t of c o n s t r u c t i n g word synonym c l a s s e s v a l i d f o r l a r g e t e x t s a m p l e s . I n t h i s s t u d y an e x t e n d e d ( s o f t ) Boo l ean l o g i c i s used f o r t h e f o r m u l a t i o n of i n f o r m a t i o n r e t r i e v a l q u e r i e s which i s c a p a b l e of r e p r e s e n t i n g b o t h t h e u s e of compound noun p h r a s e s a s w e l l a s t h e i n c l u s i o n of synonym c o n s t r u c t i o n s i n t h e q u e r y s t a t e m e n t s . The o p e r a t i o n s of t h e e x t e n d e d B o o l e a n l o g i c a r e d e s c r i b e d , and e v a l u a t i o n o u t p u t i s i n c l u d e d to d e m o n s t r a t e t h e e f f e c t i v e n e s s of t h e e x t e n d e d l o g i c compared w i t h t h a t of o r d i n a r y t e x t r e t r i e v a l s y s t e m s . I . L i n g u i s t i c Approaches i n I n f o r m a t i o n R e t r i e v a l I t i s p o s s i b l e to c l a s s i f y t h e v a r i o u s a u t o m a t i c t e x t p r o c e s s i n g s y s t em s by t h e d e p t h and t y p e of l i n g u i s t i c a n a l y s i s needed f o r t h e i r o p e r a t i o n s . S o p h i s t i c a t e d l a n g u a g e u n d e r s t a n d i n g comp o n e n t s a r e b e l i e v e d to be e s s e n t i a l t o c a r r y o u t a u t o m a t i c t e x t t r a n s f o r m a t i o n s such as t e x t abstracting and text translation. [I,14,24] Complete language understanding systems are also needed in automatic question-answering where direct responses to user queries are automatically generated by t h e s y s t e m . [11 ] On t h e o t h e r h a n d , r e l a t i v e l y l e s s s o p h i s t i c a t e d l a n g u a g e a n a l y s i s s y s t e m s may be a d e q u a t e f o r b i b l i o g r a p h i c i n f o r m a t i o n r e t r i e v a l , where r e f e r e n c e s as opposed to d i r e c t a n s w e r s a r e r e t r i e v e d i n r e s p o n s e t o u s e r queries. [21] In bibllographic retrieval, the content of i n d i v i d u a l documents i s n o r m a l l y r e p r e s e n t e d by s e t s of key words , o r key p h r a s e s , and o n l y a few s p e c i f i e d t e rm r e l a t i o n s h i p s a r e r e c o g n i z e d u s i n g D e p a r t m e n t o t Computer S c i e n c e , C o r n e l l U n i v e r s i t y , I t h a c a , New York 14853. T h i s s t u d y was s u p p o r t e d i n p a r t by t h e N a t i o n a l S c i e n c e F o u n d a t i o n u n d e r g r a n t 1ST 8 3 1 6 1 6 6 . preconstructed dictionaries or thesauruses. Even in this relatively simplified environment one does not normally undertake a linguistic analysis of any scope. In fact, syntactic and semantic analysis have b e e n used in b i b l i o g r a p h i c information retrieval only under special circumstances to analyze query phrases [22], to process structured text samples of a certain kind, [7,15], or finally t o p r o c e s s t e x t s i n s e v e r e l y r e s t r i c t e d t o p i c areas. [2] Where s p e c i a l c o n d i t i o n s do n o t o b t a i n , t h e p r e f e r r e d a p p r o a c h i n i n f o r m a t i o n r e t r i e v a l h a s b e e n t o u s e s t a t i s t i c a l or p r o b a b i l i s t i c c r i t e r i a f o r t h e g e n e r a t i o n of t h e c o n t e n t i d e n t i f i e r s a s s i g n e d t o documen t s and s e a r c h q u e r i e s . O b v i o u s l y , n o t a l l t e r m s a r e e q u a l l y u s e f u l f o r c o n t e n t identification. Accordin E to the term discrimination theory, the following criteria are of importance i n t h i s c o n n e c t i o n [ 1 6 ] : a) t e r m s w h i c h o c c u r w i t h h i g h f r e q u e n c y i n t h e documen t s of a c o l l e c t i o n a r e n o t p r e f e r r e d f o r c o n t e n t r e p r e s e n t a t i o n b e c a u s e such t e r m s a r e t oo b r o a d t o d i s t i n g u i s h t h e documen t s f rom each o t h e r ; b) t e r m s wh ich o c c u r w i t h v e r y low f r e q u e n c y i n t h e c o l l e c t i o n a r e a l s o n o t o p t i m a l , b e c a u s e such t e r m s a f f e c t o n l y a v e r y s m a l l f r a c t i o n of d o c u m e n t s ; c) t h e b e s t t e r m s t e n d to be l o w t o m e d i u m f r e q u e n c y e n t i t i e s wh ich can be p r o d u c e d by taking single terms that exhibit the required frequency characteristics; alternatively, it is possible to obtain medium frequency entities by refining high frequency terms thereby rendering them more narrow, or by broadening low frequency terms. In many operational information situations, the term broadening and narrowing operations are effectively carried out by using formulations in which the terms are connected by Boolean operators. The use of Boolean logic in retrieval is discussed in more detail in the remainder of this note.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Filter theory in MTL-algebras based on Uni-soft property

‎The notion of (Boolean) uni-soft filters‎ ‎in MTL-algebras is introduced‎, ‎and several properties of them are‎ ‎investigated‎. ‎Characterizations of (Boolean) uni-soft filters are discussed‎, ‎and some (necessary and sufficient) conditions‎ ‎for a uni-soft filter to be Boolean are provided‎. ‎The condensational property for a Boolean uni-soft filter is established.

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Galois correspondence for counting quantifiers

We introduce a new type of closure operator on the set of relations, max-implementation, and its weaker analog max-quantification. Then we show that approximation preserving reductions between counting constraint satisfaction problems (#CSPs) are preserved by these two types of closure operators. Together with some previous results this means that the approximation complexity of counting CSPs i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1985